A database of low-energy atomically precise nanoclusters

The chemical and structural properties of atomically precise nanoclusters are of great interest in numerous applications, but the structures of the clusters can be computationally expensive to predict. In this work, we present the largest database of cluster structures and properties determined using ab-initio methods to date. We report the methodologies used to discover low-energy clusters as well as the energies, relaxed structures, and physical properties (such as relative stability, HOMO-LUMO gap among others) for 63,015 clusters across 55 elements. We have identified clusters for 593 out of 1595 cluster systems (element-size pairs) explored by literature that have energies lower than those reported in literature by at least 1 meV/atom. We have also identified clusters for 1320 systems for which we were unable to find previous low-energy structures in the literature. Patterns in the data reveal insights into the chemical and structural relationships among the elements at the nanoscale. We describe how the database can be accessed for future studies and the development of nanocluster-based technologies.

at the DFT level of theory. The data set can be used to guide experimental synthesis of predicted nanoclusters, to guide searches for low-energy clusters in different chemical environments, to computationally screen for clusters suitable for a variety of applications, or to train machine learning models. Since the structural energies were obtained using a consistent computational method, the data also serves as a direct source for comparative benchmark studies of different DFT or other electronic structure techniques within the context of atomic cluster modelling. All atomic structures and their calculated properties are openly accessible, enabling researchers across the world to access it for free and use it for further analysis.

Methods
We have used the following methods to populate the Quantum Cluster Database with atomically precise nanoclusters: 1. We have searched the literature for coordinates of previously discovered candidate low-energy clusters. The atomic structures are available on our QCD website (http://muellergroup.jhu.edu/qcd) and their literature sources are summarized in Table 1 and Fig. 5. 2. We have used a genetic algorithm with ab-initio calculations. This method 36 was primarily used to identify structures of sizes and elements that are computationally cheap as determined by the number of valence electrons in the projector augmented wave potentials used (Supplementary Table 1), such as Mg, Li, Sb, Na, Ga, Si, Al, B, C, P and S. 3. We have used a genetic algorithm accelerated by actively learned moment tensor potentials (MTP) [37][38][39][40] trained on-the-fly. This method has been used to search clusters for a few sizes of Al 36 , B, C, P and S. 4. We have used correlations among energies of same structures but different elements to generate low-energy clusters of elements from known low-energy structures of chemically similar elements.
A brief description of each of these methods is provided below, with additional details in the Supplementary Note 1.
Low-energy structures mined from existing literature. Many of the clusters in the QCD have been studied before, including systematic DFT studies of small and large clusters across different elements. We collected atomic structures of clusters from publications that provide atomic coordinates of reported low-energy structures, as calculated using DFT, and from the Cambridge Cluster Database, which consists primarily of Learning on the Fly (LOTF)-GA. We have recently developed a way to accelerate the genetic algorithm using machine-learned interatomic potentials trained on-the-fly using active learning [36][37][38] . The machine-learned interatomic potentials are used to quickly identify candidate low-energy clusters, which are further relaxed locally by DFT to refine the energies. This method has been used to identify low-energy structures for some sizes of Al, B, C, P and S. Additional details about this method can be found in reference 36 . clusters built from low-energy structures of chemically similar elements. We have constructed additional low-energy cluster structures by taking advantage of the fact that for some elements there are strong correlations between the total energies of chemically similar cluster structures. Low-energy clusters of one element can be used as template to quickly generate low-energy clusters of the positively correlated elements by rescaling the template in proportion to the ratio of nearest neighbor distances. To identify these relationships, we created 55 representative cluster structure prototypes in a two-step process.
In the first step, we used the genetic algorithm to identify low energy structures for clusters of 5, 10, 15 and 20 atoms for Al, Be, Li, Mg, Na, Si, Ta, and Ti. These elements were chosen because they cover different parts of the periodic table and are computationally inexpensive relative to others because of the small numbers of valence electrons. The low energy configurations are provided in Supplementary Note 5.
In the second step, we used these clusters as templates to create clusters of all the other elements. For each target element, the interatomic distances in the cluster were scaled by the ratio of the nearest neighbor distances of the target element and the template element. The nearest neighbor distances (Supplementary Table 2) are the bond lengths in their most stable bulk form retrieved from the Materials Project 26 . To identify a chemically diverse set of elements, we used least-squares regression to express the DFT-calculated energies of unrelaxed clusters for each element as a linear combination of the energies of the remaining 54 elements. The residual errors for these fits provide a measure of the extent by which each element is different from the other 54 elements. We selected 13 elements with the highest errors: B, Ba, Be, Ca, Cr, Cs, K, Li, Mg, Na, Rb, Sr, and Zn, as these are likely to have distinct ground state structures. We then used the genetic algorithm to search for low-energy structures for clusters of 10, 15, 20, 25, and 30 atoms for these 13 elements. The low-energy structures discovered by the genetic algorithm are shown in Fig. 1, and their coordinates are provided in the Supplementary Note 6. These 65 structures were used as structural templates to determine again correlations www.nature.com/scientificdata www.nature.com/scientificdata/ among energies of different elements, following the same procedure described above. The correlation values are plotted in the heat map of Fig. 2. A positive correlation between a pair of elements means that a cluster structure Fig. 2 The Pearson correlation coefficients between energies of the set of template clusters of one element with energies of the same set of template clusters of for the rest elements, sorted in the way such that positively correlated elements are close to each other. Blue represents positive correlation, meaning structures having high energies for one element tend to also have high energies for the other element, and structures having low energies for one element tend to also have low energies for the other element. Red represents negative correlation, and white represents no correlation. The correlation values presented in this figure can be downloaded from the header of the QCD website homepage. www.nature.com/scientificdata www.nature.com/scientificdata/ having high energy for one element also tends to have high energy for the other element, while the negative correlation indicates the reverse relationship.
To evaluate the diversity of the 65 template structures, we compared the clusters using a structural similarity score as described in reference 97 , where perfectly similar structures have a score of 0.0, and we consider structures with a score above 0.3 to be dissimilar. Across the 5 different sizes and 13 different elements, only five pairs have a similarity score less than 0.3, indicating that the remaining pairs of structures are structurally distinct.
After discovering low-energy clusters with the genetic algorithm, we filled gaps on the database (i.e., elements and sizes where no clusters were available) using correlations among the energies of elements (as shown in Fig. 2). For a gap of a given system (an element-size pair), we identified the most correlated element and used its 5 lowest-energy clusters of the same size as templates to generate new clusters that were likely to have low energy. We followed the process of re-scaling the interatomic distances using the bulk nearest-neighbor bond lengths.
DFT calculations. All DFT local energy minimizations were carried out using the Vienna ab initio Simulation Package 98 (VASP) with the Perdew-Burke-Ernzerhof 99 (PBE) generalized gradient approximation exchange-correlation functional. We found that VASP was particularly efficient for clusters with a large number of atoms, which consumed the greatest amount of computational resources. We used the projector-augmented wave 100 method, with the pseudopotentials and the corresponding default cutoff energies listed in Supplementary  Table 1. The convergence criterion for electronic self-consistency was set to 10 −5 eV per cluster. Structures were optimized using the conjugate gradient algorithm 101,102 or the RMM-DIIS algorithm 103 as implemented in VASP until all the atomic forces were less than 0.1 eV/Å. Low-energy clusters within 1 eV from the lowest-energy cluster of each system and all clusters collected from the literature (in total 31,911 clusters) were re-optimized with a tighter force-convergence criterion of 0.025 eV/Å, to increase the accuracy of the low-energy isomers. All calculations were run at the gamma point with spin polarization. The magnetic moments were initialized as 1 µB/ atom for non-magnetic elements. For magnetic elements, local magnetic moments were written out for most calculations and the detailed initialization scheme can be found in a separate section below. Gaussian smearing 104 with σ = 0.0001 eV was used to achieve high accuracy when calculating final energies (2283 clusters out of the total 63,015 clusters used σ = 0.001). To accelerate convergence for some clusters, a two-step minimizations scheme was adopted with smearing of 0.1 eV in the initial step for faster convergence and a smaller value of 0.0001 eV for the final step. Symmetry was turned off for all DFT calculations to increase the chance of completing the calculations successfully. We found the inclusion of spin-orbit coupling (SOC) had little effect on the ranking of low-energy structures. To maintain the consistency of settings of DFT calculations, we did not include SOC-predicted total energies and atomic structures in the QCD database.
All DFT calculations were performed using VASP which can only perform periodic calculations, so each cluster is in effect surrounded by translationally equivalent clusters. Hence it is essential to use a simulation cell that is sufficiently large to avoid interactions among periodic images. For all elements, we enforced that the minimum distance between atoms in periodic images must be greater than 10 angstroms. Additionally, for elements in the Groups 1 A and 2 A of the periodic table, the minimum distance between neighboring images must also be greater than 3.5 times the nearest-neighbor distances listed in Supplementary Table 2. If, after relaxation, the minimum distance between neighboring images shrank below the aforementioned values, we increased the supercell size and ran the DFT calculation again. We found that these "box size" criteria are sufficient to reach energy convergence within 2 meV/atom in all 1135 tested cases (see Supplementary Fig. 7 and Supplementary Note 4 for more details) with a root mean squared error of 0.118 meV/atom. Workflow. We identified candidate low-energy cluster structures using one of the four methods listed above.
DFT calculations were performed on these clusters structures before adding them to the database. An outline of the high-throughput workflow used in these DFT calculations is provided in Fig. 3. We first initialized calculations of magnetic elements with proper magnetic values (details in next section). Then we ran ionic relaxations and checked convergence. For calculations that did not converge, we adjusted input parameters such as the step size of optimization algorithms and charge mixing parameters and reran them until convergence was achieved.
The atoms sometimes form periodic configurations that correspond to nanowires or slabs. We filtered out these types of structures by discarding clusters that had a minimum distance between periodic images smaller than 1.5 times the atomic nearest neighbor distance. We also screened for discontiguous clusters using this same criterion and discarded any discontiguous clusters that were identified.
To ensure the Quantum Cluster Database contains only unique clusters for a given element and size, when two clusters had a structural similarity score 97 less than 0.3, the cluster with higher energy was discarded. If the higher-energy cluster was from the literature, the appropriate literature references would be assigned to the structurally similar low-energy cluster. All filters that ensure quality of the clusters in the database are summarized in the Fig. 3.
The properties and metadata described in the Data Records section were calculated for each cluster and stored in a PostgreSQL database. Finally, the data are displayed in the Quantum Cluster Database website (http:// muellergroup.jhu.edu/qcd) and output as a JSON file and a CSV file.
Treatment of magnetic clusters. The final magnetic state of a cluster may depend on the atomic magnetic moments used to initialize the calculation. The final magnetic state is particularly likely to be non-zero for elements with non-zero magnetic moments in their elemental bulk phase, specifically Fe, Mn, Co, Ni, Ru, Rh, V, Cu, and Cr. For elements other than these, we initialized spin-polarized calculations with the default magnetic moment (1 μ B /atom). For the magnetic elements, we performed a benchmark on 2228 clusters with 3 to 55 atoms selected from an early version of QCD and initialized spin-polarized calculations with 4 different magnetic moments, namely 1 μ B /atom, 2 μ B /atom, 3 μ B /atom, and 5 μ B /atom, to evaluate the effect of the initial magnetic www.nature.com/scientificdata www.nature.com/scientificdata/ moments on the final magnetic states and the total energies. We found that the final magnetic states for Fe, Mn, Ru, Rh, V, and Cr clusters are particularly likely to depend on the initial magnetic moments, whereas for Cu and Co the initialization with 3 μ B /atom relaxed into the lowest energy configurations in almost all of the benchmarked clusters. For Ni clusters, the final states were independent of initialization, so we used the default 1 μ B / atom in the QCD calculations. Supplementary Table 4 showed the effects of different initial magnetic moments on the final magnetic states. To mitigate the chance of missing the correct final magnetic states, multiple initial magnetic moments were used for Fe, Mn, Ru, Rh, V, and Cr clusters, and the calculations yielding the lowest total energies were included in QCD. The set of initial magnetic moments of each element were chosen such that they led to the lowest energy states for more than 97% of all benchmarked clusters of the corresponding element. Table 2 lists the selected set of magnetic moments for the six elements, together with the single initialization value for Co, Cu, and Ni.

Data Records
We have created a website at http://muellergroup.jhu.edu/qcd to host the database. It provides downloadable links to the correlation table (Fig. 2) and an archive of all relaxed cluster structures, and individual webpages for each cluster with interactive visualization and tabulated cluster properties (discussed in the next section). The input and output files of the DFT calculations of all 63,015 clusters are publicly available (licensed under CC-BY-4.0) in the NOMAD database 105 at https://doi.org/10.17172/NOMAD/2023.02.01-1 106 . A link is created at the individual cluster webpage on the QCD website connecting to the corresponding data entry in the NOMAD database, where the DFT files can be easily downloaded.
File format. Properties of all clusters are available for download as a JSON file and as a.csv file on the Quantum Cluster Database website. In the JSON file, each cluster is stored as a key/value pair with "cluster_id" as the key and an object composed of all quantities listed in Table 3 as the value. Within the object, properties of the corresponding cluster are also stored as key/value pairs with keys being those listed in Table 3. The columns of the.csv file correspond to the keys described in Table 3. The input and output VASP files for DFT calculation of each cluster are available in the NOMAD repository at https://doi.org/10.17172/NOMAD/2023.02.01-1.

Properties.
For each cluster of a given number of atoms N and element type k, the database contains the energy relative to the lowest energy structure of size N and species k, the formation energy with respect to the lowest-energy cluster of size N-1 of species k (Eq. (1)), the formation energy with respect to the lowest-energy cluster with N + 1 atoms of the same species (Eq. (2)), the HOMO-LUMO gap, the number of valence electrons considered by DFT, the magnetic moment, a list of similar structures within the Quantum Cluster Database, a list of literature references for the cluster ("http://muellergroup.jhu.edu/qcd" if it was generated by GA or low-energy clusters of chemically similar elements), the coordinates (downloadable in XYZ format and the VASP POSCAR format), and an interactive visualization of the cluster. The formation energies are calculated using the following equations: www.nature.com/scientificdata www.nature.com/scientificdata/ 1 1 where E N is the energy of this cluster of size N, E N-1 is the energy of the lowest-energy cluster of size N-1, E N+1 is the energy of the lowest-energy cluster of size N + 1, and E atom is the energy of an isolated atom. The energies for isolated atoms used in these calculations are provided in Supplementary Table 3. The sizes of simulation cells determine the distances between periodic images and can be important for reproducing our results. Therefore, at the structure-view page of each cluster on the QCD website, we provide a link to the relaxed structure in the VASP POSCAR format, which contains the cell lattice vectors and from which the lengths of the simulation cell can be readily calculated.
As a summary, we listed below in Table 4 the links through which readers can access the information discussed in this work.

Technical Validation comparison of literature clusters and newly reported QcD clusters.
For a given cluster size and element, we compared the lowest-energy cluster from the literature against the lowest-energy cluster newly reported in the database to assess which had lower energy. There are 1595 systems for which there is at least one literature structure in the database. Out of those, the database has discovered new lowest-energy clusters for 593 systems that are lower in energy by at least 1 meV/atom (Fig. 4).
The Quantum Cluster Database contains 1379 structure types or templates (i.e., relative arrangements of atoms) that were not previously reported in the literature (Fig. 5). The 1379 templates were identified from the set of all clusters with calculated energies within 1 meV/atom of the lowest energy cluster with the same element and size. In comparison, there are 582 templates of low-energy clusters from the literature.
Before our work, there were 1595 cluster systems, or approximately 55% of the total 2915 systems, that had at least one structure whose atomic coordinates are available in literature (including the CCD). With the Quantum Cluster Database, the percentage increases to 100%. Table 5 provides a summary of the statistics of cluster systems and total number of clusters from different approaches. We note the sum of the numbers of clusters from different sources does not equal the total number of clusters in the QCD because some clusters are found in multiple sources, as shown in Table 5.
Magnetization of magnetic elements. We performed a more in-depth analysis on the final magnetic moments of the nine magnetic elements listed above (Fe, Mn, Co, Ni, Ru, Rh, V, Cu, and Cr). We performed our  Table 3. Keys, types of data, and description of the QCD data in the JSON file and.csv format. *Semicolons are used instead of line breaks.
www.nature.com/scientificdata www.nature.com/scientificdata/ analysis on 12,171 DFT calculations which listed local atomic magnetic moments. We calculated for each cluster the ratio of opposite local magnetic moments:  Figure 6 shows the distribution of the ratio of opposite local magnetic moments across the investigated cluster sizes. Although Cr is the only antiferromagnetic element in bulk phase at room temperature, we found antiferromagnetic nanoclusters with significant local magnetic moments (≥0.5 μ B in spin-up and down direction) in five elements, Cr, Mn, Ru, Rh, V. As the cluster size increases, the lower bound of the opposite moment ratios in Cr gradually increases, indicating more and more spins of Cr atoms tend to order antiparallelly. This is likely because there are more high-coordinated atoms as cluster size rises, whose local atomic environments mimic that of the bulk phase. Aside from the antiferromagnetic isomers with the ratio of opposite moments close to 1, there are many clusters with ratios between 0 and 1 for the five elements (Cr, Mn, Ru, Rh and V), exhibiting states similar to ferrimagnetic configurations, which might result from geometric frustration. Because of the complex geometry of nanoclusters and the lack of a periodic lattice, it is hard for the spins of neighboring atoms to order perfectly in an antiparallel pattern. The ferromagnetic elements Fe, Co, and Ni remain ferromagnetic as atomic clusters. Cu mostly alternates between non-magnetic and ferromagnetic with a total magnetic moment of 0 and 1 μ B for even and odd-sized clusters because of the odd number of valance electrons. For the few cases of Cu where the ratio of opposite moments is large, the magnitudes of local moments are very small, suggesting a non-magnetic nature.
effect of spin-orbit coupling. We investigated the effect of spin-orbit coupling (SOC) on heavy-metal elements by performing additional PBE + SOC calculations on the lowest energy clusters of 11 heavy-metal elements, namely Au, Bi, Hf, Hg, Ir, Os, Pb, Pt, Re, Ta, and Tl, selected based on the work by Piotrowski et al. 42 , with sizes ranging from 3 to 55 atoms. Additionally, we choose the six (where available) lowest energy isomers for small (size 10), medium (size 30), and large (size 55) clusters to study the effect of SOC on relative ordering. The energies calculated by PBE + SOC linearly correlated with energies computed by PBE, as shown in Fig. 7. We also developed a cheap proxy to approximate PBE + SOC computed energies from PBE computed energies for these 11 different elements using the least-squares regression. The conversion factors (slope and intercept) for converting to PBE + SOC energies from PBE energies for these elements are provided in Supplementary Fig. 3. We also found that the use of SOC has little effect on the energy rankings of isomers ( Supplementary Fig. 4-6). For three small clusters of systems Pb 10 , Tl 10 and Hf 10 , the PBE + SOC relaxed structures are geometrically dissimilar to the relaxed PBE structures with similarity scores larger than 0.3, and therefore are excluded from the comparisons in Supplementary Fig. 4-6.

Usage Notes
The homepage of the QCD website (http://muellergroup.jhu.edu/qcd) provides a view of the periodic table and downloadable links to an archive of all relaxed structures and the files summarizing properties of all clusters (in JSON and CSV format). When clicking on one of the 55 elements reported in this work, the other elements will be colored according to the energy correlations (Fig. 8a) and a list of clusters with 3 to 55 atoms of this element will be displayed below the periodic table (Fig. 8b). By default, clusters within 200 meV from the lowest-energy cluster of each size will be displayed. This range can be changed in the input box right beneath the periodic table.
For each cluster in the list, a snapshot of the relaxed structure is provided, along with the energy relative to the lowest energy cluster of the corresponding system.   Table 5. Statistics of the number of cluster systems and cluster structures from different sources. *Clusters from CCD which are geometrically similar to clusters from literature could also be marked as from literature and vice versa. This makes the sum of cluster systems from these two sources larger than the 1595 systems colored as from CCD or other literature in Fig. 5. Similarly the sum of the numbers of clusters from different sources may not sum to the total number of clusters in the QCD because some clusters may be found in multiple sources. www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ Clicking on a particular cluster from the list brings users to the structure-view page of this specific structure (Fig. 8c). This webpage tabulates the properties of the cluster listed in Table 3 (except for the DFT total energy and the atomic coordinates in POSCAR format). Beside the table, the webpage provides an interactive visualization of the relaxed structure and downloadable links to the structure in XYZ format and POSCAR format of VASP, the VASP input file (INCAR) and a file containing references of this cluster in the BibTex format. The entire set of input and output files of this cluster can also be downloaded from the link to the NOMAD repository at the header of the individual entry page.

code availability
The implementations of the DFT and MTP genetic algorithms used to search for low-energy structures are available via GitLab: https://gitlab.com/muellergroup/cluster-ga. The scripts and code for managing QCD, for example merging new clusters into QCD, updating existing clusters with DFT calculations using updated parameters, generating metadata of clusters listed in Table 3, are also open-sourced at https://gitlab.com/ muellergroup/qcd_mgmt.